A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields
نویسندگان
چکیده
Recently, a number of methods have been proposed for semi-supervised clustering that employ supervision in the form of pairwise constraints. We describe a probabilistic model for semisupervised clustering based on Hidden Markov Random Fields (HMRFs) that incorporates relational supervision. The model leads to an EMstyle clustering algorithm, the E-step of which requires collective assignment of instances to cluster centroids under the constraints. We evaluate three known techniques for such collective assignment: belief propagation, linear programming relaxation, and iterated conditional modes (ICM). The first two methods attempt to globally approximate the optimal assignment, while ICM is a greedy method. Experimental results indicate that global methods outperform the greedy approach when relational supervision is limited, while their benefits diminish as more pairwise constraints are provided.
منابع مشابه
Combinatorial Markov Random Fields
A combinatorial random variable is a discrete random variable defined over a combinatorial set (e.g., a power set of a given set). In this paper we introduce combinatorial Markov random fields (Comrafs), which are Markov random fields where some of the nodes are combinatorial random variables. We argue that Comrafs are powerful models for unsupervised and semi-supervised learning. We put Comraf...
متن کاملINRIA Research Project Proposal mistis Modelling and Inference of Complex and Structured Stochastic Systems
5 Domains of research 10 5.1 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1.1 Learning and classification techniques . . . . . . . . . . . . . . . . . . 11 5.1.2 Taking into account the curse of dimensionality. . . . . . . . . . . . 12 5.2 Markov models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2.1 Triplet Markov Fields f...
متن کاملClustering Heterogeneous Data with Mutual Semi-supervision
We propose a new methodology for clustering data comprising multiple domains or parts, in such a way that the separate domains mutually supervise each other within a semi-supervised learning framework. Unlike existing uses of semi-supervised learning, our methodology does not assume the presence of labels from part of the data, but rather, each of the different domains of the data separately un...
متن کاملProbabilistic Semi-Supervised Clustering with Constraints
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn th...
متن کاملHidden Markov Random Fields Based LSI Text Semi-supervised Clustering
Semi-supervised learning is an active research field. Previous results shown that unite background information into the original unsupervised clustering problem could archive higher accuracy. In this paper, we explore the cooperation between the pairwise constrains given by the user and the sematic information in natural language. In addition, we reduce the time complexity to make the algorithm...
متن کامل